Stability and optimality in stochastic gradient descent

نویسندگان

Panos Toulis

Dustin Tran

Edoardo M. Airoldi

چکیده

Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed AISGD. For statistical efficiency, AISGD employs averaging of the iterates, which achieves the optimal Cramér-Rao bound under strong convexity, i.e., it is an optimal unbiased estimator of the true parameter value. For numerical stability, AISGD employs an implicit update at each iteration, which is related to proximal operators in optimization. In practice, AISGD achieves competitive performance with other state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates. 1 ar X iv :1 50 5. 02 41 7v 2 [ st at .M E ] 2 0 O ct 2 01 5

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards stability and optimality in stochastic gradient descent

Lemmas 1, 2, 3 and 4, and Corollary 1, were originally derived by Toulis and Airoldi (2014). These intermediate results (and Theorem 1) provide the necessary foundation to derive Lemma 5 (only in this supplement) and Theorem 2 on the asymptotic optimality of θ̄n, which is the key result of the main paper. We fully state these intermediate results here for convenience but we point the reader to t...

متن کامل

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

متن کامل

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and ad...

متن کامل

Towards Stability and Optimality in Stochastic Gradient Descent

Iterative procedures for parameter estimation based on stochastic gradient descent (sgd) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on sgd are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed a...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1505.02417 شماره

صفحات -

تاریخ انتشار 2015

Stability and optimality in stochastic gradient descent

نویسندگان

چکیده

منابع مشابه

Towards stability and optimality in stochastic gradient descent

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Towards Stability and Optimality in Stochastic Gradient Descent

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

عنوان ژورنال:

اشتراک گذاری